Goto

Collaborating Authors

 spike protein




A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models

Tsuruta, Hirofumi, Yamazaki, Hiroyuki, Maeda, Ryota, Tamura, Ryotaro, Imura, Akihiro

arXiv.org Artificial Intelligence

Antibodies are crucial proteins produced by the immune system to eliminate harmful foreign substances and have become pivotal therapeutic agents for treating human diseases. To accelerate the discovery of antibody therapeutics, there is growing interest in constructing language models using antibody sequences. However, the applicability of pre-trained language models for antibody discovery has not been thoroughly evaluated due to the scarcity of labeled datasets. To overcome these limitations, we introduce AVIDa-SARS-CoV-2, a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. Furthermore, we release VHHCorpus-2M, a pre-training dataset for antibody language models, containing over two million VHH sequences. We report benchmark results for predicting SARS-CoV-2-VHH binding using VHHBERT pre-trained on VHHCorpus-2M and existing general protein and antibody-specific pre-trained language models. These results confirm that AVIDa-SARS-CoV-2 provides valuable benchmarks for evaluating the representation capabilities of antibody language models for binding prediction, thereby facilitating the development of AI-driven antibody discovery.


Vaxformer: Antigenicity-controlled Transformer for Vaccine Design Against SARS-CoV-2

Gema, Aryo Pradipta, Kobiela, Michał, Fraisse, Achille, Rajan, Ajitha, Oyarzún, Diego A., Alfaro, Javier Antonio

arXiv.org Artificial Intelligence

Motivation: The SARS-CoV-2 pandemic has emphasised the importance of developing a universal vaccine that can protect against current and future variants of the virus. Results: The present study proposes a novel conditional protein Language Model architecture, called Vaxformer, which is designed to produce natural-looking antigenicity-controlled SARS-CoV-2 spike proteins. We evaluate the generated protein sequences of the Vaxformer model using DDGun protein stability measure, netMHCpan antigenicity score, and a structure fidelity score with AlphaFold to gauge its viability for vaccine development. Our results show that Vaxformer outperforms the existing state-of-the-art Conditional Variational Autoencoder model to generate antigenicity-controlled SARS-CoV-2 spike proteins. These findings suggest promising opportunities for conditional Transformer models to expand our understanding of vaccine design and their role in mitigating global health challenges.


ML developed Pan-variant' COVID vaccine - BLOCKGENI

#artificialintelligence

According to MIT researchers, booster injections and seasonal variation doses may become obsolete thanks to a novel approach to vaccination that incorporates machine learning. By targeting infected cells, this "pan-variant" vaccination would quickly control infections while ignoring the virus itself. To be clear, this is currently undergoing animal testing and has not yet been put into use. Therefore, longer-lasting remedies than sporadic boosters for particularly problematic strains are required as COVID becomes a resident virus in the human population. The issue is that, despite how fantastic mRNA vaccines are, they are reactive rather than proactive: after spotting a variation, you sample its spike protein or another distinguishing property and introduce it to the immune system so it is alerted to look out for it.


Did AI Just Help Us Discover a Universal COVID Vaccine?

#artificialintelligence

The world is continuing to learn how to live with COVID-19, but we're still caught under the alarming specter of new variants. A single new strain with more ferocious infectivity and danger could send the world back into a public health emergency, including renewed lockdowns and a resurgence in remote work and education. To that end, there's been no shortage of work among researchers to design a COVID vaccine that could safeguard people from the whole spectrum of variants--current and future alike. Several institutions are already racing to develop a universal COVID jab, and now researchers at MIT think they may have found one that could get us across the finish line. If it does end up passing all the tests and making its way into our arms, you'll have AI to thank.


Characterizing SARS-CoV-2 Spike Sequences Based on Geographical Location

Ali, Sarwan, Bello, Babatunde, Tayebi, Zahra, Patterson, Murray

arXiv.org Artificial Intelligence

With the rapid spread of COVID-19 worldwide, viral genomic data is available in the order of millions of sequences on public databases such as GISAID. This Big Data creates a unique opportunity for analysis towards the research of effective vaccine development for current pandemics, and avoiding or mitigating future pandemics. One piece of information that comes with every such viral sequence is the geographical location where it was collected -- the patterns found between viral variants and geographical location surely being an important part of this analysis. One major challenge that researchers face is processing such huge, highly dimensional data to obtain useful insights as quickly as possible. Most of the existing methods face scalability issues when dealing with the magnitude of such data. In this paper, we propose an approach that first computes a numerical representation of the spike protein sequence of SARS-CoV-2 using $k$-mers (substrings) and then uses several machine learning models to classify the sequences based on geographical location. We show that our proposed model significantly outperforms the baselines. We also show the importance of different amino acids in the spike sequences by computing the information gain corresponding to the true class labels.


Using machine-learning to distinguish antibody targets

AIHub

The virus's spike proteins (purple) are a key antibody target, with some antibodies attaching to the top (darker purple) and others to the stem (paler zone). A new study shows that it is possible to use the genetic sequences of a person's antibodies to predict what pathogens those antibodies will target. "Our research is in a very early stage, but this proof-of-concept study shows that we can use machine learning to connect the sequence of an antibody to its function," said Nicholas Wu, a professor of biochemistry at the University of Illinois Urbana-Champaign who led the research with biochemistry PhD student Yiquan Wang; and Meng Yuan, a staff scientist at Scripps Research in La Jolla, California. With enough data, scientists should be able to predict not only the virus an antibody will attack, but which features on the pathogen the antibody binds to, Wu said. For example, an antibody may attach to different parts of the spike protein on the SARS-CoV-2 virus.


A machine learning model that could identify antibody targets

#artificialintelligence

Using a machine learning model, scientists could predict not only the virus an antibody will attack, but which features on the pathogen the antibody binds to. A new study by University of Illinois Urbana-Champaign, US has shown that by using machine learning, it is possible to use the genetic sequences of a person's antibodies to predict what pathogens those antibodies will target. Recently published in Immunity, the new approach successfully differentiates between antibodies against influenza and those attacking SARS-CoV-2. The virus's spike proteins (purple) are a key antibody target, with some antibodies attaching to the top (darker purple) and others to the stem (paler zone) [Credit: Graphic by Yiquan Wang}. "Our research is in a very early stage, but this proof-of-concept study shows that we can use machine learning to connect the sequence of an antibody to its function," said Professor Nicholas Wu.


Protein structure prediction using AlphaFold2

#artificialintelligence

My name is Dima and here I want to share my small project. It is about implementation of deep-learning tool in protein structure prediction. In the late December 2021 I was lucky to find online internship in the field of Bioinformatics. That was NyBerMan Merit Internship from LLBio-IT School and the main focus was, surprisingly (not), Covid investigation. After some technical interviews and huge competition (near 1000 participants for 20 places) I was planning next weeks of learning and doing.